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Abstract 

This  paper  studies  the  problem  of  image  stabilization,  defined  here  as  the  process  of  generating  a 
compensated  video  sequence  where  image  motion  resulting  from  camera  motion  has  been  partially 
or  totally  removed.  The  scheme  combines  various  visual  cues  such  as  points  and  horizon  lines, 
and  relies  on  an  Extended  Kalman  Filter  for  the  estimation  of  parameters  of  interest.  We  study 
both  calibrated  and  uncalibrated  stabilization  cases.  We  address  the  issues  of  local  versus  global 
stabilization.  We  consider  the  problem  of  the  selection  of  model  dynamics  for  the  estimation  of 
warping  parameters  and  illustrate  the  use  of  kinetic  models  for  the  selective  removal  of  oscillatory 
motion.  Experimental  results  from  video  sequences  generated  from  off-road  vehicle  platforms  show 
good  performance  of  the  stabilization  schemes. 

Keywords:  Image  stabilization,  motion  analysis,  image  warping,  integration  of  visual  cues,  kine¬ 
matic  and  kinetic  laws 
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1  Introduction 


Image  stabilization  is  a  key  preprocessing  step  in  dynamic  image  analysis  and  deals  with  the  removal 
of  unwanted  image  motion  in  a  video  sequence.  Depending  on  the  final  application,  this  unwanted 
motion  may  correspond  to  part  or  all  of  the  image  flow  generated  by  the  3-D  camera  motion.  If  the 
ultimate  goal  is  feedback  control  of  the  mechanical  system,  the  undesired  image  motion  corresponds 
to  the  high  frequency  oscillatory  motion  of  the  platform  hosting  the  camera.  Image  stabilization 
then  has  a  purpose  similar  to  that  of  mechanical  stabilization.  If  the  task  consists  of  detecting 
independently  moving  objects,  the  unwanted  image  motion  is  instead  that  motion  resulting  from 
camera  rotation.  If  the  goal  is  mosaicking  and  change  detection  [2,  8],  the  unwanted  image  motion 
corresponds  to  the  total  image  flow. 

Stabilization  is  here  principally  understood  as  the  warping  of  video  sequences  for  the  removal 
of  image  motion  due  to  camera  rotation.  This  procedure  is  important  for  various  reasons.  The 
most  significant  reason  is  that  the  rotational  flow  does  not  convey  structural  information  and  image 
motion  due  to  camera  translation  can  often  be  confused  with  flow  resulting  from  camera  rotation 
[13].  Image  stabilization  is  therefore  useful  for  motion  analysis,  structure  from  motion  [9,  10],  as 
well  as  the  recovery  of  the  Focus  of  Expansion  (FOE)  and  other  structural  information  (such  as 
time  to  collision).  After  performing  stabilization,  simple  and  effective  independent  motion  detection 
mechanisms  can  be  employed  [12].  In  addition,  it  is  important  for  visual  control,  as  well  as  for  other 
image  exploitation  tasks  such  as  registration,  object  detection  [20],  automatic  target  recognition, 
autonomous  vehicle  navigation  [6],  and  model-based  compression  [11]. 

While  achieving  stabilization  is  essential  for  subsequent  motion  analysis  and  recognition  tasks, 
implementing  stabilization  electronically^  is  also  important  for  designing  systems  that  are  viable 
commercially.^  Recently  a  few  promising  approaches  to  image  stabilization  have  been  proposed 


[6,  9,  15].  In  [9],  stabilization  compensates  for  the  rotational  motion  using  a  quasi-projective 
transformation  whose  parameters  are  computed  from  an  assumed  planar  patch.  This  operation  is 
performed  for  subsequent  motion/structure  recovery.  Several  methods  using  normal  flow  or  feature 
trajectories  based  on  2-D  similarity  transformations  or  3-D  motion  models  are  proposed  in  [6]. 
In  [15],  image  stabilization  is  achieved  by  first  aligning  line  segments  extracted  from  an  image 


^  MGchanical  stabilizatioii  ca,ii  bs  done  using  inertial  sensors  and  a  camera  mounted  on  an  isolated  platform, 

however,  this  results  in  prohibitively  costly  systems.  _ 

^For  example,  some  systems  available  in  existing  cameras  use  2-D  models  to  compensate  for  certain  types  of  odCS 

translational  image  motion  with  low  precision  [14].  -  -  - - 
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sequence  with  the  absolute  vertical  direction.  Disparities  between  two  successive  frames  are  then 
compensated  by  2-D  linear  translations.  This  paper  studies  the  use  of  combined  visual  cues  and 
dynamical  models  for  the  stabihzation  of  calibrated  or  uncalibrated  image  sequences. 

Parameters  relevant  for  image  warping  are  estimated  by  combining  information  from  different 
tracked  tokens,  namely  points  and  horizon  lines.  These  parameters  are  simply  the  camera  rota¬ 
tional  velocity  if  intrinsic  camera  parameters  are  available,  or  the  projectivity  coefficients,  in  the 
uncalibrated  case.  Image  plane  displacements  of  distant  feature  points  may  unambiguously  char¬ 
acterize  rotational  motion.  However,  such  points  are  sometimes  difficult  to  detect  and  track,  due 
to  the  absence  of  sufficient  intensity  gradient  information.  For  the  same  reason  flow-based  methods 
suffer  from  a  lack  of  available  visual  information.  Horizon  lines,  when  present,  constitute  on  the 
other  hand  very  strong  visual  cues,  requiring  relatively  simple  operations  for  their  tracking.  These 
tokens  must,  however,  be  combined  so  as  to  remove  all  ambiguity  concerning  camera  motion.  These 
issues  are  addressed  in  the  next  section. 

Image  stabihzation  is  a  process  closely  related  but  not  equivalent  to  image  registration.  Reg¬ 
istration  techniques  can  be  extended  for  stabihzation  purposes.  Image  stabihzation  is  inherently 
different,  however,  in  that  it  ahows  the  use  of  dynamical  information  over  long  temporal  windows. 
In  actual  apphcations,  cameras  are  often  rigidly  mounted  on  platforms.  The  rotation  of  the  camera 
therefore  arises  from  the  rotational  movement  of  the  host  at  ah  times.  It  is  possible  to  employ 
a  kinetic  law  which  captures  the  rotation  of  the  platform  to  model  the  temporal  behavior  of  the 
parameters  of  interest.  The  study  of  the  resulting  warping  parameter  dynamics  therefore  occupies 
an  important  place  in  our  analysis. 

Subsequently,  we  address  the  important  issue  of  the  selection  of  an  appropriate  dynamic  model 
suitable  for  exploiting  the  temporal  information  in  a  sequence.  We  evaluate  analyticaUy  the  use  of 
kinetic  versus  kinematic  laws  for  the  estimation  of  rotational  motion  components.  We  discuss  the 
conditions  under  which  the  use  of  simpler  kinematic  laws  yields  satisfactory  performance. 

When  dynamic  laws  are  available,  the  selective  removal  of  high  frequency  oscillatory  motion 
components  becomes  a  relevant  and  legitimate  problem,  a  point  which  has  not  yet  received  attention 
in  the  literature.  This  question  is  carefully  examined  in  later  sections.  For  instance,  in  applications 
such  as  teleoperation,  it  is  desirable  to  provide  steering  and  chmbing  impressions  to  the  teleoper¬ 
ator,  while  removing  non-smooth  motion.  By  considering  the  kinetic  law  for  the  host,  we  obtain 
additional  insight  into  this  problem  and  show  conditions  under  which  selective  stabilization  may 
be  achieved. 
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These  analytical  results  are  applied  to  the  stabilization  of  images  acquired  from  off- road  vehicles, 
for  which  the  rotational  motion  is  significant.  Specifically,  to  stabilize  a  sequence,  horizon  lines  are 
first  extracted  from  each  frame  and  distant  feature  points  close  to  the  horizon  are  detected.  Both 
types  of  image  primitives  are  tracked  over  the  sequence.  The  matched  lines  and  points  thereafter 
form  a  set  of  visual  cues.  Observations  are  then  used  along  with  a  kinematic  law  to  estimate  the 
needed  warping  parameters.  Based  on  the  estimated  warping  parameters,  a  stabilized  sequence  is 
generated.  The  main  contributions  of  this  paper  are  summarized  as  follows: 

•  Recovery  of  Warping  Parameters:  We  study  the  observability  of  warping  parameters  from 
different  image  primitives  (points  and  hnes  at  infinity)  in  both  calibrated  and  uncalibrated 
cases. 

•  Sensitivity:  We  investigate  the  sensitivity  of  the  warping  parameters  with  respect  to  the 
intrinsic  parameters  in  uncalibrated  cases. 

•  Warping  Schemes:  We  discuss  the  equivalence  of  local  and  global  stabiHzation  schemes.  We 
analyze  the  nature  of  the  residual  motion  in  the  resulting  warped  sequence. 

•  Dynamic  Models:  We  compare  simple  kinematic  versus  more  complex  kinetic  laws  in  describ¬ 
ing  the  behavior  of  the  camera  motion  in  the  presence  of  sufficient  visual  information.  The 
use  of  kinetic  laws  in  selective  stabilization,  i.e.  the  removal  of  oscillatory  motion  components, 
is  presented  as  well. 

•  Integration  of  visual  cues:  Different  visual  cues  are  integrated  for  estimating  the  rotational 
motion  in  a  sequence.  Details  are  given  on  the  implementation  of  the  Extended  Kalman  Filter 
(EKF)  and  on  feature  tracking. 

The  organization  of  this  paper  is  as  follows.  Section  2  discusses  warping  parameter  observability 
from  different  visual  cues.  Residual  motion  of  warped  sequences  is  analyzed  in  Section  3.  Section  4 
addresses  selective  stabilization  and  compares  different  dynamics-based  estimators  for  the  estima¬ 
tion  of  rotation.  An  integrated  approach  exploiting  temporal  information  as  well  as  multiple  visual 
cues  is  presented  in  Section  5.  Section  6  reports  experimental  results  on  real  image  sequences. 
Conclusions  are  given  in  Section  7. 


3 


Y 

I 


Figure  1:  The  camera  motion  and  imaging  model. 
2  Model-guided  Image  Warping  Schemes 


This  section  addresses  the  observability  of  parameters  used  for  image  stabilization  in  both  calibrated 
and  uncalibrated  cases.  We  address  parameter  recovery  from  points  and  horizon  lines.  Consider 
the  scenario  shown  in  Figure  1  where  a  camera  undergoes  rotation  with  instantaneous  angular 
velocity  u?  :  (a;2;,a;y,u;2)^,  and  translation  with  linear  velocity  V  :  (14,1^,14)^.  Let  P  :  {X,Y,Z)'^ 
denote  the  3-D  position  of  a  scene  point  with  respect  to  the  camera,  and  p  :  {x,y)^  the  image 
plane  coordinates  of  the  corresponding  projection  point.  The  relative  motion  of  the  scene  point 
with  respect  to  the  camera  is  then  described  by 

P  =  -wxP-V  (1) 


Assuming  that  the  perspective  projection  is  used  as  an  imaging  model,  p  is  related  to  P  as  follows: 

p  =  P(P)  +  p,  (2) 

where  p^  is  the  intersection  of  the  optical  axis  with  the  image  plane  and  V  denotes  the  perspective 
projection  operator,  i.e. 

P(P)  = 


f  ^ 
Jcz 


(3) 
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with  fc  the  focal  length.  Consequently,  the  image  motion  arising  from  the  camera  movement 
satisfies  (see  Appendix  A  for  derivations) 

P  =  [/J'Hp  -  Pc)^xj^(p  -  Pc)  +  ^z(p  -  Pc)  +  fc<^xy]  +  [(p  -  Pfoe)-j  (4) 

^  - - ^  V - -  — ^ 

Pr  Pt 

where  pfoe  =  'P(V)  +  Pc  and  r  =  Z/T4  respectively  correspond  to  the  Focus  of  Expansion  (FOE) 
and  the  time  to  collision  to  the  imaged  point;  12^  is  the  2  x  2  skew-symmetric  matrix  related  to 
the  rotational  velocity  component  along  the  optical  axis 


0  Wj 
0 


(5) 


and  is  an  image  vector  orthogonal  to  the  projection  of  the  instantaneous  angular  velocity 
component  parallel  to  the  image  plane, 

=  f  (6) 

[  <^x 

When  image  motion  is  described  instantaneously,  image  velocities  p  due  to  3-D  rotation  of  the 
camera  are  expressed  as  second  order  polynomial  functions  of  image  positions  and  are  independent 
of  depth.  For  this  reason,  it  is  a  weU  known  fact  that  image  motion  resulting  from  rotation  can  be 
instantaneously  compensated  for.  Care  has  to  be  taken,  however,  with  regard  to  the  interpretation 
of  the  resulting  derotated  sequence,  a  point  addressed  in  the  next  section.  On  the  other  hand, 
unless  relative  depth  is  known,  or  all  imaged  points  lie  on  the  same  3-D  plane,  translational  motion 
cannot  be  compensated  for. 

Consider  a  distant  point  (i.e.  let  r  — >■  oo)  and  denote  its  position  by  P .  As  seen  from  (4)  such 
points  move  relative  to  the  camera  as  if  only  rotation  were  present.  Therefore  we  may  equivalently 
cLssume  for  these  points  that  the  3-D  motion  simply  satisfies 


P  =  —a?  X  P 


(7) 


For  off-road  vehicle  navigation,  or  images  taken  from  a  plane  or  a  helicopter,  horizon  lines  or  partial 
profiles  of  objects  lying  far  away  constitute  very  strong  visual  cues.  In  Figure  2,  consider  an  image 
horizon  line  denoted  by  C;  C  is  uniquely  characterized  by  W,  the  3-D  vector  normal  to  the  plane  11 
through  £  and  the  camera  center.  Since  the  image  motion  of  horizon  lines  is  explained  exclusively 
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Figure  2:  Geometric  representation  of  a  horizon  line  £,  the  plane  11,  the  3-D  normal  vector  W  and 
the  image  plane  normal  vector  o. 

in  terms  of  the  camera  rotation,  it  follows  that  the  motion  of  the  normal  vector  W  is  itself  described 

by 

W  =  -a?  X  W  (8) 

If  u  is  one  solution,  i.e.  W  =  —u X  W,  then  fcW-hu  also  satisfies  (8)  for  any  k.  Therefore,  observing 
the  image  motion  of  one  horizon  line  characterizes  the  rotational  component  on  plane  11  only.  There 
is  indeterminacy  along  the  direction  W.  Given  one  observation,  the  set  of  possible  solutions  of  (8) 
describes  an  affine  line  in  3-D  rotational  parameter  space.  When  only  one  horizon  fine  is  observed, 
rotation  may  be  determined  in  the  Least  Square  (LS)  sense  which  corresponds  to  the  camera  motion 
with  least  energy  that  explains  the  image  motion  of  that  particular  fine.  In  this  case  rotational 
motion  inducing  image  motion  along  the  line  feature  itself  (lateral  motion)  is  not  always  totally 
compensated  for  (since  it  is  assumed  to  be  zero),  and  other  fines  or  geometrical  cues  near  the 
horizon  can  be  used  to  qualify  lateral  motion  for  full  stabilization.  Indeterminacy  exists  also  if  only 
one  distant  point  is  observed.  In  this  case,  this  indeterminacy  involves  the  rotational  component 
along  a  ray  from  the  image  center  to  this  point.  Quantitatively,  however,  fines  and  points  carry 
equal  amounts  of  information  in  the  determination  of  rotation.  If  any  combination  of  two  of  these 
features  is  observed,  rotation  can  be  fully  characterized,  except  in  some  degenerate  cases  for  which 
W  X  P  —  0,  which  in  practice  cannot  occur  unless  the  observer  possesses  an  unreasonably  large 
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Field  of  View  (FOV).  Letting  w  =  V(W)  +  pc,  one  may  solve  the  over- determined  linear  system 

Qcv  =  D  (9) 

where  D  =  [pf , . . . ,  •  •  •  >  Q  is  a  matrix  derived  from  (4): 

(xi  -  Xc)(t/1  -  Vc)  -(/c  +  (*1  -  Xcf)  (yi  -  Vc) 

ifc  +  fr'^ivi-ycf)  -fr\^i-^c){yi-yc)  -(a^i-xc) 

{xM  -  Xc){yM  -  yc)  -{fc  +  {xM  -  Xcf)  {yu  -  yc) 

Q=  {fc  +  fc^iyM-  yc)^)  -Xc){yM  -yc)  -{xm-Xc)  (lO) 

(Wj;j  -  Xc)(Wyi  -  yc)  -{fc  +  (Wii  -  Xc)'^)  (Wyj  -  t/c) 

{fc  +  ff'^{^yi  -  ycf)  -/r^(Wxi  -  a:c)(Wyi  -  yc)  -(w^i  -  Xc) 

{fc  +  fc  ^{'^VN  ~  yf)^)  ~fc  ^{^^N  ~  ^o){'^yN  ~  yc)  —{'^XN  “  ^c) 

Line  features  can  therefore  be  combined  with  other  tracked  tokens  such  as  distant  points  for  stabi¬ 
lization. 

We  now  discuss  the  image  warping  schemes.  Irrespective  of  the  particular  trajectories  of  the 
vectors  a?(t)  and  V(t),  the  positions  of  3-D  points  at  two  time  instants  can  always  be  described 
by  an  element  of  the  Special  Euclidean  group  SE{Z)  (uniquely,  if  the  rotation  center  is  given),  i.e., 
there  exists  a  total  rotation  R  :  [xij],i  =  l,...,3,j  =  1,...,3  and  translation  T  :  {Tx,Ty,Tz)'^, 
between  any  two  frames,  such  that  the  3-D  point  positions  P i  and  P 2  expressed  in  the  camera 
frame  of  reference  satisfy 

P2  =  RPi  +  T  (11) 

As  before,  for  a  distant  point,  the  contribution  from  the  translation  T  is  negligible.  The  image 
plane  positions  of  such  a  point  at  ti  and  ^2  ^'^e  then  expressed  as 

P2  =  (c^Pi  +  l)"^(Api  -t-  b)  (12) 


where 

A 

b 


,_1  fcTu  +  XcTzi  fcTu  +  Xcr32  ^12 

a  — 

fcr21  +  ycXsi  /c7’22  +  ycX32  021  <*22 

_1  -fcXcTn  -  fcycXl2  +  fhl3  -  xlr3i  -  XcVcTs^  +  fcXc^SS 
a 

-fcXcX21  -  fcycT22  +  fc‘^23  “  “  2/0^32  +  /c2/c’“33 
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c 


(15) 


d-^ 

rzi 

_ 

Cl 

TZ2 

C2 

d  =  -XcTsi  -  ycT32  +  fcTZZ  (16) 


Similarly,  for  a  distant  line  feature  characterized  by  the  projected  normal  vector  w, 

W2  =  (c^wi  +  l)“^(Awi  +  b) 


(17) 


As  seen  from  (12)  and  (17),  for  distant  features,  their  image  plane  motions  are  described  by  pro¬ 
jective  group  operations.  In  fact,  this  is  expected  since  points  on  the  horizon  faU  into  a  plane  at 
infinity  and  the  image  motion  of  planes  is  described  exactly  by  a  projectivity  [1].  Consequently,  if 
{r-jj,  i  =  1, , . 3,  j  =  1, . . .,  3}  in  A,  b  and  c  are  estimated  from  points  and  lines  near  the  horizon, 
we  can  compensate  for  the  rotation  between  two  images  using  the  transformation 


(P2C^-A)  ^(b-p2) 

(18) 

(c^P2  +  l)“^(Ap2  +  b) 

(19) 

where  Pc2  represents  the  points  on  the  compensated  image.  This  is  also  a  projective  transformation 
with  parameters  A,b  and  c,  as  expected  by  virtue  of  the  projective  group  property.  It  is  shown 
later  that  there  exists  only  a  translation  between  stabilized  images.  Therefore,  (19)  forms  the  basis 
of  our  stabilization  scheme  for  calibrated  sequences. 

Note  that  simpler  image  transformations  have  been  used  in  the  literature  for  stabilization  pur¬ 
poses:  the  SE(2)  group  of  transformation  and  affine  transformations,  i.e.  Pc2  =  Ap2  +  b,  are  some 
examples  used  in  works  such  as  [6,  9];^  these  are  essentially  appropriate  in  cases  of  parallel- frontal 
motion. 

In  cases  where  the  intrinsic  parameters  are  unknown,  we  show  that  it  is  stiU  possible  to  achieve 
image  stabilization  using  both  distant  points  and  lines.  For  points,  since  p  is  directly  measurable 
from  the  images,  the  projective  transformation  in  (12)  remains  unchanged.  However,  due  to  the 
unknown  Pc  and  /c,  the  mapping  for  distant  fines  in  (17)  is  no  longer  applicable  since  w  is  no 
longer  measurable.  Consider  instead  the  measured  image  normal  vector  to  a  fine  near  the  horizon 
described  in  the  image  plane  by 

o^p  =  1  (20) 


^In  [9],  a  second  order  polynomial  quasi-pro jective  transformation  is  assumed,  Pct  ^ 
transformation  is  then  used  for  image  warping. 


Cpp^d  -h  Ap  4-  b  an  aflfine 


It  is  shown  in  Appendix  B  that  o  is  related  to  the  previously  defined  projected  normal  w  by 

w  = -(1- <  Pc,o  >)“VcO  +  Pc  (21) 

Using  the  identities  o^p2  =  1  and  ofpi  =  1  along  with  (12)  we  can  further  show  that  the  movement 
of  o  also  satisfies  a  projective  transformation  whose  parameters  are  related  to  the  inverse  projective 
transformation  A,  b,  c  in  (19): 

02  =  (— b^Oi  +  1)  ^(A^oi  —  c)  (22) 


Consequently,  image  stabilization  of  uncalibrated  sequences  can  be  carried  out  by  estimating  the 
eight  parameters  in  A,b  and  c  from  distant  points  and  lines.  While  this  is  possible  in  principle, 
in  practice,  the  computation  of  the  projectivity  parameters  from  distant  features  can  be  unstable. 
To  see  this,  consider  the  system  obtained  from  (12), 


Gf  =  q 


(23) 


where  f  is  the  vector  including  the  eight  projectivity  parameters,  f  =  (an, . .  .,02)^,  q  consists  of 
the  coordinates  of  a  set  of  points  such  as  p2,  and  G  is  a  matrix  with  elements  composed  of  the 
coordinates  of  pi  and  p2.  This  matrix  is  often  ill  conditioned;  it  can  be  shown  that  the  last  two 
columns  of  G  contain  second  order  terms  in  the  image  coordinates,  while  the  third  and  sixth  columns 
contain  zeroth  order  terms.  Geometrically,  since  the  eight  parameters  are  computed  from  features 
lying  only  on  a  constrained  region  of  the  image  (namely  close  to  the  horizon),  there  exist  many 
projectivities  leaving  the  horizon  invariant,  some  of  which  are  not  suitable.  One  possible  solution 
to  this  problem  is  to  further  constrain  the  intrinsic  parameters  by  assuming  an  approximate  value 
and  concentrating  on  estimation  of  the  rotation. 

Indeed,  there  often  are  situations  where  the  intrinsic  parameters  are  approximately  known."*  In 
these  cases,  denote  the  intrinsic  parameters  by  A=  (/c,  a;c,  2/c)^»  and  let  the  nominal  values  be  Aq. 
If  we  further  assume  that  the  eight  projective  coefficients  vary  smoothly  with  respect  to  A  then 


f(A) 


f(Ao)+g 


(A-Ao) 


f(^o)  +  J (•^o)(-^  ~  •^o) 


(24) 

(25) 


where  J  is  the  Jacobian  matrix.  When  the  elements  of  J  are  small,  the  effect  of  the  imperfect 
knowledge  of  the  intrinsic  parameters  is  negligible  for  stabilization  purposes.  A  small  error  in 

^In  fact,  the  errors  in  these  parameters  can  be  moderate,  as  shown  in  the  experiments  later. 
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the  assumed  intrinsic  parameters  will  still  lead  to  acceptable  stabilization  results.  For  example, 
consider  the  sensitivity  of  bi  with  respect  to  /c-  Then  from  (14)  and  set  (a^cj/c)  =  (0,0),  we  have 

Wc  " 

—Uy  (27) 

where  (27)  is  typically  true  since  the  rotation  between  two  consecutive  frames  is  small,  say  on  the 
order  of  10“^.  Consequently,  if  the  error  in  fc  is  within  100  pixels,  the  error  in  bi  wiU  be  within 
one  pixel.  Therefore,  we  can  concentrate  on  estimating  the  three  rotational  parameters  as  in  the 
case  of  calibrated  sequences.  It  will  be  seen  later  that  this  assumption  holds  in  real  applications. 

3  Residual  Motion  Analysis 

We  analyze  here  the  nature  of  the  resulting  sequence  when  the  image  warping  described  in  the 
previous  section  is  applied.  We  further  compare  two  types  of  stabilization:  local  stabilization  and 
global  stabilization.  For  convenience,  denote  the  sequence  of  original  images  by  <S  =  {/q?  -fi?  •  •  •  ?  ^n} 
and  caU  the  compensated  sequence  Sc  =  {/o? ? -fen}*  To  obtain  a  translation-only  sequence  S^ 
either  a  local  or  global  stabilization  scheme  can  be  utilized.  The  local  approach  directly  compensates 
for  the  rotation  between  and  Ick  to  generate  while  the  global  method  computes  the 

rotation  between  Ik+i  and  Ik- 

Consider  first  the  residual  motion  in  a  sequence  generated  by  a  local  stabilization  scheme.  Since 
every  frame  IkJ^i  is  directly  stabilized  with  respect  to  the  previous  compensated  image  Ick^  we  have 

P  ck+l  =  R^+l,ArPA;+l  (28) 

where  R^,+i  ^  denotes  the  rotation  between  Ick  and  /at+i,  while  Pk+i  and  Pc/c+i  respectively  rep¬ 
resent  the  3-D  coordinates  of  a  scene  point  relative  to  the  camera  coordinate  system  in  Ik+i  and 
Icki-i’  Also,  recall  that  the  motion  of  Pk  in  the  original  sequence  is  described  by 

Pk+i  =  'R'k+l.k'Pk  +  T^k+i,k  (29) 

with  R^--j-i,A;  and  TkJ\-i,k  the  rotation  and  translation  between  the  camera  frames  of  reference  in  Ik 
and  Ik-\-i’  Substituting  (29)  into  (28)  and  using  (28)  to  relate  Pek  and  P/:,  the  residual  motion 
between  Ick+i  and  Ick  can  be  expressed  as  follows: 

Pc/:+l  —  P  ck  +  P^4.1,A;Ta:-}-1,A-  (30) 
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Furthermore,  is  related  to  the  rotation  in  the  original  sequence  by 

=  Rfc4.i,i;R/fc,fc_i  •  •  •Ri,o  (31) 

Therefore,  as  expected,  the  compensated  sequence  exhibits  a  purely  translational  motion.  If 
is  known,  the  compensated  sequence  can  be  used  for  ego-motion  recovery.  Observe,  however, 
that  while  the  magnitude  of  the  translation  in  the  stabilized  sequence  is  the  same  as  that  of  the 
translation  magnitude  in  the  original  sequence,  the  apparent  translational  heading  (and  therefore 
the  FOE)  now  rotates  with  the  original  rotational  motion. 

This  local  stabilization  scheme,  however,  is  not  practical  for  real  applications.  As  seen  from 
(31),  Rfc+i  fc  in  fact  accounts  for  the  rotation  between  Jo  and  Jfc+i.  For  large  k,  the  motion  between 
Jo  and  IkJri  is  likely  to  be  large.  The  overlap  between  Ick  and  Jfc+i  may  therefore  be  small.  The 
common  features  are  more  difficult  to  find  and  consequently,  it  is  not  easy  to  compute  Rfc4.i^jt 
reliably.  Global  stabilization,  on  the  other  hand,  focuses  on  the  estimation  of  the  rotation  between 
Ik  and  Jfc+i;  the  stabilized  image  is  created  afterwards  with  respect  to  the  reference  frame,  say  Jo, 
according  to 

Pcfc-n  =  (Rfc+i,fcRM-i  •  •  •  Ri,o)“'Pfc+i  (32) 

The  sequence  generated  by  the  global  stabilization  scheme  therefore  exhibits  the  same  residual 
motion  as  the  sequence  obtained  using  the  local  stabilization  approach.  However,  in  contrast  with 
the  local  approach,  the  global  stabilization  scheme  is  more  likely  to  provide  reUable  estimates  of 
the  parameters  of  interest  in  real  applications,  since  the  area  of  overlap  between  h  and  J^+j  is 
greater.  This  scheme  is  therefore  employed  in  our  work. 

4  Exploitation  of  Dynamics  for  Full  or  Selective  Stabilization 

The  next  issue  involves  the  robust  estimation  of  the  warping  parameters  used  in  the  stabilization 
scheme  presented  above.  As  argued  earber,  the  inherent  nature  of  stabiUzation  implies  that  tempo¬ 
ral  information  present  in  the  sequence  should  be  exploited  for  image  warping  purposes.  A  relevant 
problem  then  becomes  that  of  selecting  suitable  parameter  dynamics  so  as  to  exploit  this  temporal 
information.  We  concentrate  on  the  dynamics  for  the  rotation  parameters.  The  incorporation  of 
the  proposed  dynamic  laws  into  the  estimation  process  is  presented  later  in  Section  5. 
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We  consider  here  dynamic  laws  appropriate  for  capturing  the  evolution  of  rotational  motion  of 
the  platform  on  which  the  camera  is  mounted.  Since  the  movement  of  the  platform  is  affected  by 
the  interaction  between  mechanical  elements  and  the  environment,  the  motion  is  best  described 
by  a  kinetic  model  which  emulates  the  behavior  of  the  mechanical  system.  The  use  of  simpler 
kinematic  models  is  compared  to  that  of  more  complete  kinetic  models.  These  ideas  are  studied 
for  the  case  of  a  vehicle  performing  off-road  navigation. 

Furthermore,  due  to  the  close  relationship  between  kinetic  models  and  mechanical  stabilization, 
the  possibility  of  employing  kinetic  models  for  the  removal  of  oscillatory  image  motion,  or  so- 
called  selective  stabilization,  now  becomes  viable.  It  will  be  shown  that  the  needs  of  selective 
stabilization  are  different  from  those  of  total  stabihzation.  More  specifically,  in  order  to  achieve 
selective  stabilization,  features  close  to  the  camera,  providing  translational  information,  need  to  be 
considered  as  well. 

This  section  therefore  starts  with  the  description  of  a  kinetic  model  of  a  vehicle.  The  presenta¬ 
tion  of  this  model  leads  to  the  study  of  selective  stabihzation.  Subsequently,  using  this  model,  the 
appropriateness  of  kinematic  laws  for  describing  the  evolution  of  rotational  parameters  is  evaluated. 

4.1  Kinetic  Models  and  Selective  Stabilization 

The  movement  of  a  vehicle  over  rough  terrain,  in  general,  can  be  decomposed  into  two  components: 
the  smooth  motion  and  the  oscillatory  motion.  The  smooth  motion  corresponds  to  the  behavior  of 
the  vehicle  as  if  the  terrain  were  smooth;  it  includes  translation,  as  well  as  rotation  due  to  steering 
and  chmbing.  The  oscillatory  motion,  on  the  other  hand,  refers  to  the  residual  vehicular  motion; 
it  characterizes  the  response  of  the  vehicle  to  the  roughness  of  the  terrain. 

The  removal  of  the  unwanted  oscillatory  motion  is  important  for  many  applications,  and  is 
referred  to  here  as  selective  stabilization.^  For  an  active  vision  system,  the  separation  of  oscillatory 
motion  from  smooth  motion  is  useful  for  achieving  fixation.  Another  example  where  selective 
stabilization  is  important  is  teleoperation  in  which  the  vehicle  needs  to  be  remotely  controlled.  An 
image  sequence  unperturbed  by  the  oscillatory  motion,  while  stiU  preserving  smooth  motion,  is 
highly  desirable:  the  teleoperator  needs  to  fuUy  evaluate  the  effects  of  chmbing  and  steering.  In 
addition,  while  an  approximate  selective  stabihzation  scheme  could  be  attempted  by  simple  low- 
pass  filtering  of  the  computed  total  rotational  components,  a  scheme  which  more  closely  resembles 

^Note  that  we  use  the  term  selective  stabilization  in  a  different  sense  than  [3]. 
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Figure  3:  The  four-wheel  vehicle  model  [7]. 


mechanical  stabilization  is  of  more  interest. 

We  proceed  by  employing  a  four-wheel  vehicle  model  to  account  for  the  oscillatory  vehicular 
movement.  This  model  takes  into  account  the  phenomena  of  bounce,  pitch  and  roll  (illustrated 
in  Figure  3),  and  has  been  widely  used  for  the  design  and  analysis  of  suspension  systems.  All 
tires  are  modeled  by  linear  springs  with  the  same  stiffness  coefficient  Kj-  M^f  and  M^r  represent 
the  masses  of  unsprung  elements  such  as  the  front  and  rear  wheels  and  their  axles.  K  j  Kr 

and  Cr  are  the  characteristics  of  the  linear  springs  and  shock  absorbers  modeling  the  suspension 
system.  Assume  that  each  tire  contacts  the  terrain  at  a  point  at  all  times  and  the  movements  of 
unsprung  elements  {xi,  2:3,  X5,  X7}  are  measurable  (for  example,  by  placing  accelerometers  on  these 
components);  then  three  degrees  of  freedom  remain  in  this  model:  the  bouncing  displacement  of 
the  center  of  gravity  of  the  sprung  element  Xc,  the  pitch  angle  0  and  the  roll  angle  4>.  The  high 
frequency  yaw  motion  is  usually  small  during  driving;  it  can  therefore  be  neglected.  Note  that 
because  of  the  decomposition  of  the  vehicle’s  movement,  these  oscillatory  states  are  measured  with 
respect  to  the  equilibrium  positions  resulting  from  the  smooth  motion.  Then  the  dynamics  of  the 
oscillatory  components  are  expressed  by  [16] 

=  ^US^US  "F  (33) 

where  x„s  is  the  state  vector  consisting  of  the  oscillatory  components  of  interest, 

^us  =  {xc,ic,0,e,cf>jf  (34) 
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while  describes  the  measurable  movements  of  the  unsprung  elements, 

(xi.xi.xs,  is,  xs,  xs,X7,  xj)'^  (35) 

^us  Tus  respectively  denote  constant  matrices  whose  entries  are  related  to  system  parameters 
such  as  the  mass  of  the  sprung  element,  the  moment  of  inertia,  the  spring  characteristics,  etc. 

We  now  turn  to  the  problem  of  selective  stabilization  using  the  oscillatory  motion  model  given  in 
(33).  Assume  that  a  camera  is  rigidly  mounted  on  the  vehicle,  and  define  the  stabihzed  coordinate 
system  V(/)  which  describes  the  motion  of  the  camera  as  if  only  smooth  motion  were  present. 
Consider  also  the  unstabilized  coordinate  system  V^(it),  which  characterizes  the  total  camera  motion, 
or  equivalently,  the  total  vehicular  motion.  Then  for  a  point  in  the  scene,  the  movement  due  to  the 
smooth  motion  is  described  by 

FviU)  =  Rs{U)[Pv{U-i)  +  TsiU)]  (36) 

where  Pv(*)  is  the  position  with  respect  to  the  stabilized  coordinate  system,  while  Rs(^{)  and  Ts{ti) 
account  for  the  rotation  and  translation  of  the  stabihzed  coordinate  system  between  ti-i  and  ti. 
Subsequently,  due  to  the  additional  oscillatory  motion,  the  coordinates  of  the  point  with  respect 
to  the  unstabihzed  coordinate  system,  Pv/(4),  are  expressed  as 

Pv^(^z)  =  Rzis(^5  ^2)[Pv(^2)  ~  Tu5(^?)]  (^'^) 

where  Rusi^y  <!>'’>  U)  ahgns  both  coordinate  systems  at  the  instant  and  T^5(t^)  represents  the 
additional  translation  due  to  the  bouncing  motion.  Substituting  (36)  into  (37),  we  obtain  the 
motion  of  the  point  due  to  the  total  movement  of  the  vehicle  as 

Pv'(^0  ~  Ri,2-iPv'(^^“i)  "b  Ri(T5(^i)  +  T^5(t^_i))  —  Rti5(^, <p;  ^2)T^5(^i)  (38) 

where 

(39) 

Ri  =  (40) 

The  rotation  Ri,i_i  aligns  the  coordinate  system  V'(^i-i)  with  V'(tj)  while  Rj-  aligns  V(ti_i)  with 

Assume  that  the  movement  of  the  vehicle  is  known  up  to  time  instant  tj_i,  i.e.  Rusi^,<Pi  it-i)> 
T„s(ti_i),  and  Pv'(tt-i)  are  known.  Then,  in  order  to  differentiate  between  oscillatory  motion 
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and  smooth  movement  components,  the  translational  components  and  Ts{ti)  need  to  be 

characterized.  This  observation  points  to  a  basic  difference  between  selective  and  full  stabihzation. 
Different  strategies  therefore  need  to  be  employed  for  selective  stabilization.  More  specifically,  in 
this  case,  points  close  to  the  vehicle,  whose  image  motions  are  affected  by  the  translational  platform 
movement,  need  to  be  considered.  When  only  the  motions  of  distant  features  are  considered,  (38) 
simphfies  to 

Pv'(4)  = 

showing  instead  that  the  total  rotation  between  two  instants  is  observable  from  distant  features 
alone. 


4.2  Rotation  Dynamics 


We  turn  now  to  the  evaluation  of  kinetic  models  (as  compared  with  simple  kinematic  models)  to 
capture  the  evolution  of  rotational  motion.  The  simple  models  are  sometimes  casually  used  without 
further  examination.  We  wiU  show  analytically  that  kinetic  models  are  superior  to  kinematic 
models,  but  that  kinematic  models  yield  acceptable  performance  if  sufficient  numbers  of  visual 


observations  are  available. 

As  mentioned  previously,  the  attitude  change  between  two  time  instants  is  due  to  smooth 
rotation  and  oscillatory  motion.  Rotational  motion  components  such  as  rotation  due  to  climbing 
occur  only  during  limited  time  intervals;  they  are  therefore  neghgible  most  of  the  time.  We  will 
study  the  performance  of  a  kinematic-law  based  estimator  for  the  estimation  of  pitch  and  roU  motion 
only.  The  estimator  can  easily  be  generalized  to  take  into  account  all  the  rotational  components 
(including  the  smooth  motion  components). 

Since  only  oscillatory  rotation  is  being  investigated  here,  for  convenience  we  can  rewrite  ^us  iii 
(33)  as 


$ 


US 


^cr 

^rc 


(42) 


where  and  denote  constant  matrices  with  respective  sizes  2x2, 2x4, 4x2  and 

4x4.  Define  a  constant  matrix  G  such  that 


y  = 


(43) 


with  y  =  [0,9,<t>,^]'^ .  For  simphcity,  the  measurements  z  available  for  the  estimation  of  y  are 
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related  to  y  by  a  linear  transformation  C: 


z  =  Cy 


(44) 


Assuming  that  the  oscillatory  rotational  motion  component  is  exactly  described  by  the  dynamics 
of  y,  we  compare  the  performance  of  estimators  based  on  different  dynamics  when  additional 
measurements  are  available.  The  kinetic-law  based  estimator  captures  the  oscillatory  motion  using 
coefficients  related  to  the  true  dynamics  [5]: 

Yi  =  +  Ki[z  -  Cyi]  (45) 


where  yi  represents  the  resulting  estimate  of  y,  and  Ki  is  the  desired  gain  matrix.  The  kinematic- 
law  based  estimator,  on  the  other  hand,  assumes  no  knowledge  of  the  underlying  mechanical  system 
and  therefore  only  employs  the  smooth  variation  dynamics 

72  =  ^72  +  K2[z  -  Cy2]  (46) 


where  y2  and  K2  respectively  have  meanings  similar  to  those  of  yi  and  Kj ,  while 


^01  0 

0  $01 


with  0  the  2x2  zero  matrix,  and 


^01 


0  1 
0  0 


Let  us  define  the  corresponding  estimation  errors  as 


Yi  =  Y  -  Yi 
72  =  7-72 


It  can  be  shown  that  yi  and  y2  satisfy 


Yi  =  ($r-KiC)yi  +  u 

y2  =  (^  -  K2C)y2  -f  U  -  A$ry2 


(47) 


(48) 


(49) 

(50) 


(51) 

(52) 


where  u  is  related  to  x,^,,  Xc  and  Xc  by 


U  =  GTu^X^  +  $rcX6 


(53) 
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with  Xj,  =  [xc,x^ ,  while  is  due  to  mismatches  between  the  assumed  and  true  dynamics: 

=  ^-^T  (54) 

As  seen  from  (51),  if  Ki  is  chosen  so  that  the  eigenvalues  of  -  KiC  have  negative  real  parts, 
then  yi  remains  bounded  as  long  as  u  is  bounded.  Similarly,  in  addition  to  u  being  bounded,  if 
A$y2  is  also  bounded,  we  can  carefully  choose  K2  so  that  y2  is  bounded.  Nonetheless,  y2  wiU 
exhibit  a  larger  error  than  yi. 

In  sum,  kinetic-law  based  estimators  yi  outperform  kinematic-law  based  estimators  y2.  The 
kinetic-law  estimator,  however,  requires  knowledge  of  the  mechanical  system  parameters.  These 
parameters  are  not  always  easily  measurable.  When  the  system  parameters  are  unknown,  the 
availability  of  sufficient  numbers  of  visual  measurements  should  allow  for  the  use  of  the  simpler 
kinematic  law  while  still  yielding  good  warping  parameter  estimates.  The  next  section  employs  the 
kinematic  laws  for  the  estimation  of  total  rotation. 

5  Parameter  Estimation 

Various  types  of  estimators  can  be  employed  to  obtain  the  parameters  used  in  our  stabilization 
scheme.  Recursive-type  estimators  update  the  estimates  of  the  parameters  whenever  new  infor¬ 
mation  becomes  available  while  batch-type  estimators  compute  the  estimates  by  processing  all  the 
information.  Because  of  their  ability  to  process  data  sequentially  and  their  lower  computational 
complexity,  recursive-type  estimators  are  preferred  over  batch  estimators  for  real  time  processing. 
Kalman  filters  make  use  of  dynamics  for  their  estimation.  We  therefore  provide  Extended  Kalman 
Filter  (EKF)  formulations  for  both  the  calibrated  and  uncalibrated  stabilization  cases. 

For  both  calibrated  and  uncalibrated  sequences,  the  algorithm  only  needs  to  estimate  three 
rotational  parameters,  and  the  state  vector  x  is  simply  equal  to 

X  =  oj  (55) 

Based  on  the  simple  kinematic  law  justified  in  the  previous  section,  we  have 

X  =  0  (56) 

Subsequently, 

x(ti+i)  =  x(ti)  (57) 
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This  constitutes  the  plant  equation  in  our  recursive  estimation  algorithm.  We  turn  next  to  the 
observation  equations.  Assume  that  the  tracked  tokens  are  composed  of  M  points  (whose  projection 
points  are  pi, . . . ,  pm)  and  N  horizon  Unes  (each  one  with  associated  normal  vector  w,-,  ?  =  1, . . . ,  iV 
for  calibrated  sequences,  or  alternatively  with  given  o,,  i  =  1, . . . ,  A  for  uncahbrated  sequences,  with 
Oi  defined  as  in  (20)).  The  measurement  vector  in  the  cahbrated  case  is  defined  by 

z  =  (pf,...,p^,wf,...,w^)^  (58) 

From  (12)  and  (17)  we  can  write  the  measurement  equations  as 

z{ti+i)  =  hi+i,e'[x(fi+i)]  +  n(ti+i)  (59) 


where  h  is  a  nonlinear  function  while  n  denotes  the  measurement  noise.  More  specifically,  in  the 
calibrated  case,  hjq.i,j[x(fj+i)]  is  expressed  as 

(c^Pi  +  l)"HApi  +  b) 

(c^PM  +  l)‘‘^(ApM  +  b) 

(c^wi  +  l)"’-(Awi  +  b) 


hi+i,i[x(fi+i)]  = 


(c^Wyv  +  1)  HAw;V  +  b) 
with  A,  b,  and  c  expressed  with  respect  to  the  state  vector  components  as 


A  =  d- 


fcni  +  XcTsi  /cri2  +  Xcr32 
/c7’21  +  Vcrsi  /cr-22  +  ycr32 


On  ai2 
^21  022 


d  -  -XcTsi  -  ycrz2  +  fcnz 

The  measurement  vector  in  the  uncalibrated  case  is  instead  defined  by 

z  =  (pf,...,p^,of,...,o^)^ 


(60) 


b  =  d-^ 

-fcXcTzi 

-  fcVcriz  +  fc^is  -  -  x^Vcrsz  +  fcXcrss 

-  fcVc'TZZ  +  /c  7-23  -  X^VcTzi  -  +  fcVcTZZ 

bx 

62 

c  =  d~^ 

1  1 

CO  CO 

1 _ 1 

Cl 

C2 

(61) 

(62) 

(63) 

(64) 

(65) 
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and  the  observation  equation  then  uses 


(c^Pi  +  1)  HApi  +  b) 


(c^PM  +  1)  HApM  +  b) 

(-b^oi  +  l)"^(A^oi  -  c) 


(-b^Oiv  +  1)  ^(A^otv-c) 


with  A,  b,  and  c  derived  from  the  identities 


(66) 


A^  =  d-^ 

(^22  ~  ^2^2  ^2^1  *“  <^21 

hiC2  ai2  ^11  ~  ^1^1 

(67) 

\ 

II 

a22Ci  —  a2lC2 

aiiC2  -  ai2Ci 

(68) 

-b  = 

®22^1  “  ®12^2 

01162  —  021^1 

(69) 

d  =  0^11(^22  ““  ^12^21 

(70) 

With  the  plant  and  measurement  equations  given  in  (57)  and  (59),  when  horizon  lines  and 
points  are  tracked,  the  EKF  scheme  can  be  applied  to  recursively  estimate  the  two- frame  angular 
velocity.  This  consists  of  the  following  steps: 


•  Step  1:  State  and  covariance  propagation 

x(ir+i)  =  x(t+) 

S(t-+i)  =  s(tf)  +  s,,(t,+i) 

where  x(tf)  and  S(t+)  denote  the  estimates  of  x(t,)  and  the  associated  covariances:  they 
are  obtained  based  on  information  contained  in  the  sequence  up  to  the  frame.  x(t“).i) 
and  S(t,^j),  on  the  other  hand,  are  the  predicted  estimates  of  x(ti+i)  and  the  predicted 
covariances  respectively  before  the  incorporation  of  the  {i  +  1)‘'^  frame,  while  Su;(4+i)  is  the 
covariance  of  the  plant  noise  w(t,q-i). 
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step  2:  State  and  covariance  update 


S(i+,)  =  [I-K(fHi)H,-+i,i]S(f-^J 


where  x(f,t|.i)  is  the  desired  estimate  of  x(tj+i),  and  S(t^j)  is  the  associated  covariance. 
K(ti+i),  S„(ti4_i)  and  I  respectively  represent  the  gain  matrix,  the  covariance  of  n(ti+i)  and 


the  identity  matrix.  Hj+i,,-,  on  the  other  hand,  is  the  Hnearized  approximation  of 


H 


9h 


!+l,i  — 


i+l,i 


5x(ti+i) 


x(f- 


+  1^ 


(73) 


The  algorithm  is  initialized  by  a  batch  process  using  an  LS  estimate  of  the  rotational  parameters 
as  in 

min  V  |iz(4-+i)  -  h;+i,i[x(ti+i)]|p  (74) 

x(*i)  . 

The  minimization  can  be  solved  using  techniques  based  on  gradient  descent,  such  as  conjugate 
descent. 


This  concludes  our  discussion  of  parameter  estimation.  The  next  section  addresses  image  prim¬ 
itive  tracking.  Synthetic  and  real  experiments  are  reported  as  well. 


6  Experimental  Results 

We  first  briefly  describe  approaches  to  the  detection  and  tracking  of  horizon  lines  and  points. 
Experimental  results  on  two  real  image  sequences  are  then  presented. 

6,1  Detection/Tracking  of  Image  Primitives 

The  first  set  of  visual  cues  for  characterizing  the  rotation  consists  of  horizon  lines.  There  have 
been  numerous  approaches  to  tracking  a  set  of  line  segments  over  a  sequence  [4,  18].  For  simplicity, 
we  only  focus  on  tracking  one  line  in  our  work,  although  using  a  set  of  lines  is  desirable  in  some 
situations. 

Assuming  that  the  lines  near  the  horizon  appear  in  the  form  of  large  vertical  brightness  deriva¬ 
tives,  the  detection  of  the  line  of  interest  is  achieved  in  three  steps: 

•  Step  1:  Based  on  the  histogram  of  the  image,  a  binary  image  is  generated  by  thresholding 
the  original  image. 
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•  step  2:  In  the  binary  image,  the  pixels  with  large  vertical  brightness  derivatives  most  likely 
constitute  points  on  the  horizon  line.  A  simple  template  matching  technique  is  applied  to 
identify  pixels  with  large  brightness  derivatives. 

•  Step  3:  A  median  fit  method  is  used  to  robustly  group  identified  pixels  into  lines.  The 
longest  line  is  taken  to  be  the  line  of  interest. 

In  the  experiments,  the  above  procedures  are  applied  to  each  image  in  the  sequence.  The  line 
extracted  from  each  image  is  assumed  to  be  near  the  horizon,  and  these  lines  are  matched  to  each 
other. 

The  second  class  of  inputs  to  the  full  stabilization  scheme  are  the  image  plane  trajectories  of 
a  set  of  points  near  the  horizon.  A  localized  tracking  algorithm  reported  in  [17]  is  employed  to 
obtain  feature  point  trajectories.  In  particular,  a  Gabor-wavelet  based  feature  extraction  algorithm 
is  initially  applied  to  the  first  image.  Then,  according  to  the  detection  of  lines  near  the  horizon, 
feature  points  of  interest  are  selected  to  initiate  the  tracking  algorithm. 

For  each  feature  point,  the  algorithm  employs  a  trajectory  model  to  exploit  the  temporal  infor¬ 
mation  and  decomposes  the  sequence-tracking  problem  into  successive  two-frame  tracking  problems. 
Three  steps  are  involved  in  finding  the  matching  point  of  a  tracked  feature  point  in  the  following 
frame: 

•  Step  1;  Based  on  the  trajectory  model,  the  first  step  uses  a  probabilistic  data  association 
technique  to  estimate  the  inter-frame  motion.  The  resulting  estimates  are  used  in  the  next 
step  to  facilitate  the  process  of  finding  the  corresponding  point  in  the  following  frame. 

•  Step  2:  The  second  step  applies  a  correlation-type  matching  algorithm  to  identify  the  cor¬ 
responding  point.  The  inter-frame  motion  estimates  are  first  used  to  compensate  for  the 
rotation  as  well  as  to  reduce  the  search  area  for  the  matching  point.  Various  criteria  [19]  are 
then  applied  to  find  the  matching  point  to  sub-pixel  accuracy. 

•  Step  3:  The  third  step  processes  the  temporal  information  contained  in  the  new  frame  by 
updating  the  parameters  in  the  trajectory  model  using  a  Kalman  filter.  Afterwards,  the 
algorithm  goes  back  to  the  first  step  and  continues  to  track  the  point  in  the  following  frame. 

Under  the  assumption  that  the  tracked  points  exist  at  aU  times,  the  feature  point  tracking 
algorithm  can  identify  the  corresponding  points  over  a  sequence.  However,  in  our  formulation,  it 
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is  not  required  that  the  same  set  of  points  be  used.  Therefore,  we  only  need  to  focus  on  finding 
matching  points  between  two  frames.  A  different  set  of  points  can  be  used  whenever  this  is  desirable. 


6.2  Synthetic  Experiments 


This  section  tests  the  results  of  Section  4;  we  illustrate  the  performance  of  the  kinematic  law  using 
a  one-dimensional  example,  and  show  that  both  laws  yield  comparable  performance  when  sufficient 
numbers  of  visual  observations  are  present.  Consider  the  scenario  in  which  the  vehicle  travels  along 
a  straight  path  most  of  the  time  except  when  encountering  impulse-like  disturbances  (bumps).  Each 
bump  is  modeled  by  a  half  sine  wave  with  a  given  height  and  width.  (For  simplicity,  no  smooth 
rotational  motion  during  driving  is  assumed.)  The  excitations  to  the  vehicle  are  then  given  by 


Xoi(t)  =  < 


^hsml^(vt  -  zo)] 

0 


V  —  —  V 

elsewhere 


with 


^02(i)  =  ^03(i)  =  ^oi(t  -  “),  xo4(i)  =  3:o2(t  -  -) 

where  zq?  L,  and  v  respectively  denote  the  location  of  the  bump,  the  vehicle’s  length  and  its  forward 
speed.  (Only  one  bump  is  used  in  this  example,  with  Zq  =  1.345  m,  bh  =  0.1  m,  hyj  —  0,2  m,  i  == 
2.7  m  and  v  =  13.45  m  •  s““^).  The  nominal  values  of  the  vehicle  parameters  are  listed  in  Table  1, 
and  the  oscillatory  motion  is  synthesized  according  to  (33)  with  only  the  pitching  motion  shown  in 
Figures  4a  and  4b.  During  driving,  a  sequence  of  images  is  acquired  at  20  ifz  from  an  on-board 
camera  (each  image  has  size  2.0  x  2.0  and  resolution  2000  x  2000),  and  a  set  of  distant  points  is 
tracked;  the  coordinates  of  these  points  with  respect  to  the  initial  camera  frame  of  reference  are 
listed  in  Table  2. 


Table  1:  Model  parameters.  the  mass  of  the  sprung  element,  lyy:  the  moment  of  inertia  along 
the  pitch  axis,  Wa  and  W^:  the  distance  of  the  center  of  gravity  to  the  front  and  rear  ends. 


Mb 

lyy 

Kt 

1710.0kg 

1031.3kg-m'' 

57.5kg 

75.0kg 

200.0kN-m-i 

Kj 

Kr 

Q  (C.) 

Wa 

Wb 

18.0kN-m-i 

lO.OkN-m-i 

l.OkN-m  ^-s  ^ 

1.4m 

1.3m 

Subsequently,  to  compare  the  kinetic  and  and  kinematic  law  based  estimators,  we  design  a 
recursive-type  estimator  similar  to  the  one  used  in  the  real  application  to  estimate  the  pitching 
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X  10'’ 


(c)  (d) 

Figure  4:  Kinetic  versus  kinematic:  (a)  (b)  d{t),  (c)  Bias  in  the  estimates  of  (d)  RMSE 

in  the  estimates  of  0d{t)- 

motion.  The  resulting  bias  and  Root-Mean-Squared  Error  (RMSE)  in  the  estimates  of  the  attitude 
change  between  t  and  t  +  At,  9d{t)  =  9{t)  -  9{t  -  At),  are  shown  in  Figures  4c  and  4d.  As  seen 
from  the  figures,  when  reliable  image  cues  are  available,  the  performance  of  the  kinematic-law 
based  estimator  (dashed  line)  is  very  close  to  that  of  the  kinetic-law  based  one  (solid  line).  More 
importantly,  the  larger  bias  in  the  kinematic-law  based  estimator,  due  to  the  modeling  error  in  the 
time  evolution  of  the  parameters,  is  negligible. 
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Table  2:  Distant  point  locations. 


Point 

3-D  coordinates 

1 

84 

-84  1200 

2 

160 

-80  2000 

3 

170 

115  2300 

4 

100 

140  1750 

6.3  Real  Imagery 

We  show  the  results  of  the  application  of  the  stabilization  schemes  to  two  sequences  with  significant 
unstabilized  components  in  both  calibrated  and  uncalibrated  cases.  To  let  the  reader  more  precisely 
judge  the  results  yielded  by  this  technique,  we  have  placed  MPEG  movies  showing  the  original 
sequences,  the  primitive  tracking  and  the  stabilized  sequences  at  the  following  URL: 

http: //www.cfar. umd.edu/  '  yao/stabilization.html. 

They  can  easily  be  viewed  using  any  web-browser  (Mosaic,  NetScape,  etc). 

6.3.1  Martin  Marietta  Sequence 

The  first  sequence,  distributed  by  Martin  Marietta,  was  obtained  from  a  vehicle  performing  off-road 
navigation.  Figure  5  shows  four  frames  from  the  sequence  in  which  each  frame  has  size  347  X  238. 
The  motion  is  composed  of  translation  (with  a  dominant  looming  component)  and  unstabilized 
rotation.  The  FOV  is  40  x  30  degrees  (in  the  horizontal  and  vertical  directions,  respectively), 
while  the  optical  axis  intersects  the  image  plane  at  the  center  of  the  image.  The  experiments  were 
first  carried  out  for  the  calibrated  case.  In  this  sequence,  according  to  the  detection  scheme,  both 
the  mountain  profile  and  the  line  near  the  horizon  are  detected.  Figure  6a  shows  the  results  for 
one  frame.  (The  mountain  profile  is  not  employed  by  the  current  algorithm.)  Four  feature  points 
close  to  the  detected  line  are  identified  and  tracked  over  the  sequence  afterwards.  For  display 
purposes,  the  resulting  image  plane  trajectories  are  superimposed  on  the  last  frame  of  the  sequence 
as  shown  in  Figure  6b.  The  LS  estimate  of  the  rotation  is  also  computed  as  if  only  the  horizon  line 
were  available.  Subsequently,  stabilization  from  both  fuU  and  LS  rotation  is  applied;  the  recursive 
algorithm,  in  addition,  uses  the  estimates  from  the  batch  algorithm  computed  from  the  first  ten 
frames.  Figures  6c  and  6d  respectively  show  the  LS  and  full  angular  velocity  estimates.  The  sohd 
line,  the  dashed  line  and  the  dashed-dotted  line  respectively  correspond  to  the  pitch  motion,  the 
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yaw  motion  and  the  roll  motion.  Since  the  vehicle  exhibits  very  small  lateral  motion  and  the 
sequence  is  densely  sampled,  the  LS  estimates  are  therefore  close  to  the  full  estimates  for  this 
sequence.  Finally,  a  stabihzed  sequence  is  obtained  using  the  global  stabilization  scheme  with  the 
full  rotational  estimates.  The  stabilized  sequence  appears  to  undergo  translation  only,  with  the 
direction  of  translation  varying  noticeably  when  the  rotational  component  in  the  original  sequence 
is  large. 


(c)  (d) 


Figure  5:  Martin  Marietta  sequence:  (a)  Image  9,  (b)  Image  39,  (c)  Image  74,  (d)  Image  99. 

The  stabilization  scheme  for  the  uncalibrated  case  was  also  applied,  assuming  that  the  intrinsic 
parameters  are  unknown.  The  sensitivity  of  the  eight  parameters  with  respect  to  the  four  intrinsic 
parameters,  obtained  using  the  cahbrated  parameters  as  nominal  values,  are  first  shown  in  Figures  7 


frame  number  frame  number 


(c)  (d) 

Figure  6:  Experimental  results  from  the  calibrated  scheme  for  the  Martin  Marietta  sequence:  (a) 
The  detected  line  and  the  mountain  profile,  (b)  Feature  point  trajectories,  (c)  The  LS  estimates: 
(jJx  (soMd  line),  Uy  (dashed  line),  Wj  (dashed-dotted  line),  (d)  The  full-stabilization  estimates. 

and  8.®  (The  additional  parameter  is  due  to  the  different  focal  length  in  either  direction.)  For  visual 
purposes,  we  only  illustrate  the  sensitivity  of  each  parameter  with  respect  to  the  focal  length  in  the 
horizontal  direction  and  Xc-  As  argued  earher,  these  values  are  small  and  therefore  a  reasonable 
deviation  from  the  nominal  values  of  the  intrinsic  parameters  is  not  critical.  Consequently,  we  vary 
the  parameters  and  perform  different  tests.  For  example,  assume  the  focal  length  in  pixels  to  be 
700  and  600  in  the  horizontal  and  vertical  directions  respectively  (the  real  values  being  480  and 

® Instead  of  showing  the  computed  J,  Figures  7  and  8  show  the  relative  sensitivity,  defined  as  J  •  A,  with  A  a 
diagonal  matrix  whose  elements  are  Aq. 
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443  respectively),  and  the  image  center  to  be  50  pixels  away  from  the  true  position,  say  (100,  100). 
Then  using  the  same  visual  cues.  Figure  9  shows  the  estimated  angular  velocity.  The  uncalibrated 
stabilization  scheme  is  thereafter  applied  to  the  original  sequence.  It  can  be  seen  from  the  MPEG 
movie  that  the  result  is  close  to  the  one  obtained  using  the  calibrated  scheme.  The  result  for  the 
uncalibrated  sequence  using  unconstrained  projectivity  parameters,  i.e.  direct  estimation  of  the 
eight  parameters  from  the  image  features,  leads  instead  very  poor  results,  since  the  estimation  is 
unstable. 


xltf’ 


(a) 


(b) 


(c) 


(d) 


Figure  7:  Sensitivities  of  A  with  respect  to  fc  in  the  horizontal  direction  (solid  line)  and  Xg  (dashed 
line)  for  the  Martin  Marietta  sequence:  (a)  an,  (b)  ai2,  (c)  021,  (d)  022- 
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(c)  (d) 


Figure  8:  Sensitivities  of  b  and  c  with  respect  to  fc  in  the  horizontal  direction  (solid  line)  and  Xc 
(dashed  line)  for  the  Martin  Marietta  sequence:  (a)  6i,  (b)  62,  (c)  ci,  (d)  C2. 

6.3.2  NIST  Sequence 

The  same  procedure  was  apphed  to  another  sequence,  provided  by  NIST,  which  was  also  acquired 
from  an  off-road  vehicle.  Figure  10  shows  four  frames  of  the  sequence  in  which  each  frame  is  of 
size  640  x  480.  The  motion  is  composed  of  a  translation  with  steering  and  unstabilized  motion 
components.  The  FOV  is  70  and  60  degrees  along  the  horizontal  and  vertical  directions  respectively, 
while  the  optical  axis  again  intersects  the  image  plane  at  the  center  of  each  image.  One  line 
and  partial  object  profiles  near  the  horizon  are  detected  as  displayed  in  Figure  11a,  while  the 
feature  trajectories  are  similarly  superimposed  on  the  last  frame  as  shown  in  Figure  11b.  The  LS 
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(b)  (c) 


Figure  9:  Angular  velocity  estimates  from  the  uncalibrated  scheme  (solid  Mne)  versus  calibrated 
scheme  (dashed  hne)  for  the  Martin  Marietta  sequence:  (a)  uixi  (b)  Wy,  (c)  Uz- 

and  full  estimates  of  the  angular  velocity  are  plotted  in  a  similar  fashion  in  Figures  11c  and  lid 
respectively.  Since  the  sequence  exhibits  some  steering,  the  LS  and  full  estimates  are  quite  different. 
In  addition,  the  steering  behavior  is  preserved  in  the  stabilized  sequence;  only  the  unstabilized 
motion  is  removed. 

We  also  applied  the  uncalibrated  stabilization  scheme  to  this  sequence.  Again,  we  choose  the 
focal  length  in  pixels  to  be  600  and  700  (in  contrast  with  the  true  values  249  and  457),  and  the 
center  of  the  image  to  be  (250,  300).  The  sensitivity  of  the  projective  coefficients  is  shown  in 
Figures  12  and  13.  The  angular  velocity  estimates  are  illustrated  in  Figure  14.  The  MPEG  movie 
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(c)  (d) 

Figure  10:  NIST  sequence:  (a)  Image  10,  (b)  Image  35,  (c)  Image  60,  (d)  Image  80. 
again  shows  good  stabilization  results  for  this  sequence. 


7  Conclusion 

This  paper  has  presented  a  scheme  for  stabilizing  a  sequence  acquired  by  a  moving  observer.  In 
particular,  the  algorithm  exploits  the  temporal  information  in  the  sequence  and  utilizes  various 
visual  cues.  Image  warpings  derived  from  the  3-D  motion  are  used  instead  of  other  approximate 
2-D  mappings,  thereby  capturing  more  closely  the  image  motion  resulting  from  the  camera  rotation. 
The  consideration  of  global  stabilization,  in  addition,  makes  the  algorithm  more  suitable  to  real 
appbcations.  We  analyze  the  nature  of  the  resulting  sequence.  The  use  of  kinetic-laws  for  estimation 
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Figure  11:  Experimental  results  from  the  calibrated  scheme  for  the  NIST  sequence:  (a)  The  detected 
line  and  partial  object  profiles,  (b)  Feature  point  trajectories,  (c)  The  LS  estimates:  (solid  line), 

u)y  (dashed  line),  (dashed- dotted  line),  (d)  The  full-stabilization  estimates. 

purposes  is  carefully  justified  and  tested  in  synthetic  cases.  The  introduction  of  kinetic  laws  leads 
us  to  study  the  feasibility  of  selective  stabilization  schemes,  or  the  exclusive  removal  of  oscillatory 
motion  components.  Lastly,  we  study  the  removal  of  rotational  motion  from  uncalibrated  sequences, 
for  which  a  scheme  to  alleviate  the  unstable  estimation  of  the  projectivity  coefficients  is  proposed 
and  tested.  All  results  have  been  thoroughly  tested  and  the  resulting  sequences  have  been  made 
available  to  the  interested  reader. 
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(a) 


xlC’ 


X  10’’ 


(b) 


(c) 


(d) 


Figure  12:  Sensitivities  of  A  with  respect  to  fc  in  the  horizontal  direction  (solid  line)  and  Xc  (dashed 
line)  for  the  NIST  sequence:  (a)  an,  (b)  ai2,  (c)  021,  (d)  022- 
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Appendix  A 


We  give  the  derivation  of  image  motion  in  vector  form  in  this  appendix.  As  in  Section  2,  assume 
that  perspective  projection  is  used  as  the  imaging  model  and  the  camera  undergoes  translation  and 
rotation.  Then,  for  a  3-D  point  P  :  (X,y,  Z)'^,  the  image  motion  of  the  projection  point  p  :  (a:,  y)^ 
can  be  obtained  as  follows; 


p  =  jpP  +  p- 
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As  before,  define  pfoe  :  (/c^,  /c^)^  +  Pc  and  r  :  ^  as  weU  as  the  foHowing  matrices 
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Then  (78)  can  be  rewritten  as 
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Consequently,  we  have 


P  =  /c  Hp  -  Pc)‘^iy'(P  -  Pc)  +  ^*(p  “  Pc)  +  /c^4  “  (Pfoc  “  p)7 


(81) 
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Appendix  B 


We  derive  the  relationship  between  w,  the  projection  of  the  3-D  normal  vector  to  the  plane  II  (see 
Figure  2),  and  the  image  plane  normal  o  in  this  appendix.  Consider  an  image  horizon  line  denoted 
by 

C  :  y  =  a  +  bx  (82) 

where  (x,  y)^  are  the  image  plane  coordinates  of  a  3-D  point.  Then  the  image  plane  vector  normal 
to  C,  defined  in  (20),  is 

o  =  (83) 


a  a 


Let  fc  be  the  focal  length  of  the  camera  and  assume  that  the  optical  axis  intersects  the  image  plane 
at  Pc  :  (xc,?/c)^.  Then 

Pi  :  {0,a  +  bxc-  ycjcf 

P2  :  (^^^^,0,/c)^ 

are  the  coordinates  of  two  points  lying  on  the  plane  11,  with  respect  to  the  camera  frame  of  reference. 
Consequently,  by  taking  the  cross  product  of  Pi  and  P2,  we  have 


N  Pi  X  P2 

_  1  d  +  bxc  —  yc<.T 

~  ^  ’“6’  bU  ’ 


(85) 

(86) 


The  3-D  normal  vector  W  is  then  obtained  by 


W 


N 

IINil 


(87) 


w  is  related  to  o  by 


w  =  'P(W)-|-pc 
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